{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "95f0a171",
   "metadata": {},
   "source": [
    "(functions)=\n",
    "# Functions\n",
    "\n",
    "## Introduction\n",
    "\n",
    "One of the best ways to improve your reach as a data scientist is to write functions.\n",
    "Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting.\n",
    "Writing a function has three big advantages over using copy-and-paste:\n",
    "\n",
    "1.  You can give a function an evocative name that makes your code easier to understand.\n",
    "\n",
    "2.  As requirements change, you only need to update code in one place, instead of many.\n",
    "\n",
    "3.  You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).\n",
    "\n",
    "Writing good functions is a lifetime journey.\n",
    "Even after using Python for many years you can still learn new techniques and better ways of approaching old problems.\n",
    "The goal of this chapter is not to teach you every esoteric detail of functions but to get you started with some pragmatic advice that you can apply immediately.\n",
    "\n",
    "As well as practical advice for writing functions, this chapter also gives you some suggestions for how to style your code.\n",
    "Good code style is like correct punctuation.\n",
    "Youcanmanagewithoutit, but it sure makes things easier to read!\n",
    "As with styles of punctuation, there are many possible variations.\n",
    "Here we present the style we use in our code, but the most important thing is to be consistent.\n",
    "\n",
    "### Prerequisites\n",
    "\n",
    "You will need the **pandas** and **numpy** packages for this chapter."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4b99b14",
   "metadata": {},
   "source": [
    "## Functions\n",
    "\n",
    "A function has inputs, it performs its function, and it returns any outputs. Functions begin with a def keyword for ‘define a function’. It then has a name, followed by brackets, (), which may contain function arguments and function keyword arguments. This is followed by a colon. The body of the function is then indented relative to the left-most text. Function arguments are defined in brackets following the name, with different inputs separated by commas. Any outputs are given with the return keyword, again with different variables separated by commas.\n",
    "\n",
    "Let's see a very simple example of a function with a single *argument* (or arg):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0450ad6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def welcome_message(name):\n",
    "    return f\"Hello {name}, and welcome!\"\n",
    "\n",
    "\n",
    "# Without indentation, this code is not part of function\n",
    "name = \"Ada\"\n",
    "output_string = welcome_message(name)\n",
    "print(output_string)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08c06c6a",
   "metadata": {},
   "source": [
    "One powerful feature of functions is that we can define defaults for the input arguments. These are called *keyword arguments* (or kwargs). Let's see that in action by defining a default value for `name`, along with multiple outputs--a hello message and a score."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd49bee5",
   "metadata": {},
   "outputs": [],
   "source": [
    "def score_message(score, name=\"student\"):\n",
    "    \"\"\"This is a doc-string, a string describing a function.\n",
    "    Args:\n",
    "        score (float): Raw score\n",
    "        name (str): Name of student\n",
    "    Returns:\n",
    "        str: A hello message.\n",
    "        float: A normalised score.\n",
    "    \"\"\"\n",
    "    norm_score = (score - 50) / 10\n",
    "    return f\"Hello {name}\", norm_score\n",
    "\n",
    "\n",
    "# Without indentation, this code is not part of function\n",
    "name = \"Ada\"\n",
    "score = 98\n",
    "# No name entered\n",
    "print(score_message(score))\n",
    "# Name entered\n",
    "print(score_message(score, name=name))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c4c6cdf",
   "metadata": {},
   "source": [
    "```{admonition} Arguments and keyword arguments\n",
    ":class: tip\n",
    "\n",
    "*arguments* are the variables that functions *always* need, so `a` and `b` in `def add(a, b): return a + b`. The function won't work without them! Function arguments are sometimes referred to as *args*.\n",
    "\n",
    "*Keyword arguments* are the variables that are optional for functions, so `c` in `def add(a, b, c=5): return a + b - c`. If you do not provide a value for `c` when calling the function, it will automatically revert to `c=5`. Keyword arguments are sometimes referred to as *kwargs*.\n",
    "```\n",
    "\n",
    "\n",
    "```{admonition} Exercise\n",
    "What is the return type of a function with multiple return values separated by commas following the `return` statement?\n",
    "```\n",
    "\n",
    "In that last example, you'll notice that we added some text to the function. This is a doc-string, or documentation string. It's there to help users (and, most likely, future you) to understand what the function does. Let's see how this works in action by calling `help()` on the `score_message` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50dfff24",
   "metadata": {},
   "outputs": [],
   "source": [
    "help(score_message)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e466fcd",
   "metadata": {},
   "source": [
    "```{admonition} Exercise\n",
    "Write a function that returns a high five unicode character if the input is equal to \"coding for economists\" and a sad face, \":-/\" otherwise.\n",
    "\n",
    "Add a second argument that takes a default argument of an empty string but, if used, is added (concatenated) to the return message. Use it to create the return output, \":-/ here is my message.\"\n",
    "\n",
    "Write a doc-string for your function and call `help` on it.\n",
    "```\n",
    "\n",
    "To learn more about args and kwargs, check out these [short video tutorials](https://calmcode.io/args-kwargs/introduction.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58c754c0",
   "metadata": {},
   "source": [
    "## When should you write a function?\n",
    "\n",
    "You should consider writing a function whenever you've copied and pasted a block of code more than twice (i.e. you now have three copies of the same code).\n",
    "For example, take a look at this code.\n",
    "What does it do?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a744085",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(np.random.normal(size=(10, 4)), columns=[\"a\", \"b\", \"c\", \"d\"])\n",
    "\n",
    "df[\"a\"] = (df[\"a\"] - df[\"a\"].min()) / (df[\"a\"].max() - df[\"a\"].min())\n",
    "df[\"b\"] = (df[\"b\"] - df[\"b\"].min()) / (df[\"b\"].max() - df[\"a\"].min())\n",
    "df[\"c\"] = (df[\"c\"] - df[\"c\"].min()) / (df[\"c\"].max() - df[\"c\"].min())\n",
    "df[\"d\"] = (df[\"d\"] - df[\"d\"].min()) / (df[\"d\"].max() - df[\"d\"].min())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5da40e0",
   "metadata": {},
   "source": [
    "You might be able to puzzle out that this rescales each column to have a range from 0 to 1.\n",
    "But did you spot the mistake? There was an error when copying-and-pasting the code for `df[\"b\"]`: someone forgot to change an `a` to a `b`.\n",
    "Extracting repeated code out into a function is a good idea because it prevents you from making this type of mistake.\n",
    "\n",
    "To write a function you need to first analyse the code.\n",
    "How many inputs does it have?\n",
    "\n",
    "```python\n",
    "df[\"a\"] - df[\"a\"].min() / (df[\"a\"].max() - df[\"a\"].min())\n",
    "```\n",
    "\n",
    "This code only has one input: `df[\"a\"]`. To make the inputs more clear, it's a good idea to rewrite the code using temporary variables with general names. Here this code only requires a single numeric vector, so we'll call it `x` and put it into a function.\n",
    "\n",
    "Functions begin with a def keyword for ‘define a function’. It then has a name, followed by brackets, (), which may contain function arguments and function keyword arguments. This is followed by a colon. The body of the function is then indented relative to the left-most text. Function arguments are defined in brackets following the name, with different inputs separated by commas. Any outputs are given with the return keyword, again with different variables separated by commas.\n",
    "\n",
    "So, in Python, functions have the form:\n",
    "\n",
    "```python\n",
    "def name_of_function(<inputs>):\n",
    "    <code to carry out on inputs>\n",
    "    <return statement, if appropriate>\n",
    "```\n",
    "\n",
    "So here it would be:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a16d6dd0",
   "metadata": {},
   "source": [
    "```python\n",
    "def rescale(x):\n",
    "    return (x - x.min()) / (x.max() - x.min())\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22f6afb7",
   "metadata": {},
   "source": [
    "There is still some duplication in this code. We're computing the minimum of the data twice, so it makes sense to do it in once:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "406648b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def rescale(x):\n",
    "    minimum = x.min()\n",
    "    return (x - minimum) / (x.max() - minimum)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60609a8f",
   "metadata": {},
   "source": [
    "Pulling out intermediate calculations into named variables is a good practice because it makes it more clear what the code is doing.\n",
    "\n",
    "\n",
    "There are three key steps to creating a new function:\n",
    "\n",
    "1.  You need to pick a *name* for the function. Here we've used `rescale` because this function rescales a vector to lie between 0 and 1.\n",
    "\n",
    "1.  You list the inputs, or *arguments*, to the function inside `function`.\n",
    "    Here we have just one argument.\n",
    "    If we had more the call would look like `function(x, y, z)`. (We might also have a named *keyword argument* such as `data=` following the *arguments*.)\n",
    "\n",
    "3.  You place the code you have developed in the *body* of the function, a block that immediately follows `function(...):`.\n",
    "\n",
    "Note the overall process: we only made the function after we'd figured out how to make it work with a simple input. It's easier to start with working code and turn it into a function; it's harder to create a function and then try to make it work.\n",
    "\n",
    "At this point it's a good idea to check your function with a few different inputs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b72eafca",
   "metadata": {},
   "outputs": [],
   "source": [
    "rescale(pd.Series([-10, 0, 10]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "edd018c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "rescale(pd.Series([1, 2, 3, np.nan, 5]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "629f75d4",
   "metadata": {},
   "source": [
    "As you write more and more functions you'll eventually want to convert these informal, interactive tests into formal, automated tests.\n",
    "That process is called unit testing. Unfortunately, it's beyond the scope of this book.\n",
    "\n",
    "We can simplify the original example now that we have a function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce066e4f",
   "metadata": {},
   "outputs": [],
   "source": [
    "df[\"a\"] = rescale(df[\"a\"])\n",
    "df[\"b\"] = rescale(df[\"b\"])\n",
    "df[\"c\"] = rescale(df[\"c\"])\n",
    "df[\"d\"] = rescale(df[\"d\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb41766f",
   "metadata": {},
   "source": [
    "Compared to the original, this code is easier to understand and we've eliminated one class of copy-and-paste errors. There is still quite a bit of duplication since we're doing the same thing to multiple columns; we can actually remove this too, but we'll cover how to do that later in the book.\n",
    "\n",
    "Another advantage of functions is that if our requirements change, we only need to make the change in one place.\n",
    "For example, we might discover that some of our variables include infinite values, and `rescale()` (effectively) fails:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f503c36b",
   "metadata": {},
   "outputs": [],
   "source": [
    "rescale(pd.Series([1, 2, 3, np.inf, 5]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f29a050e",
   "metadata": {},
   "source": [
    "Because we've extracted the code into a function, we only need to make the fix in one place:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67c861b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def rescale(x):\n",
    "    x = x.replace(np.inf, np.nan)\n",
    "    minimum = x.min()\n",
    "    return (x - minimum) / (x.max() - minimum)\n",
    "\n",
    "\n",
    "rescale(pd.Series([1, 2, 3, np.inf, 5]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee8598d3",
   "metadata": {},
   "source": [
    "This is an important part of the \"do not repeat yourself\" (or DRY) principle.\n",
    "The more repetition you have in your code, the more places you need to remember to update when things change (and they always do!), and the more likely you are to create bugs over time."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e5df698",
   "metadata": {},
   "source": [
    "## Functions are for humans and computers\n",
    "\n",
    "It's important to remember that functions are not just for the computer, but are also for humans. For the most part, Python doesn't care what your function is called, or what comments it contains, but these are important for human readers. This section discusses some things that you should bear in mind when writing functions that humans can understand.\n",
    "\n",
    "The name of a function is important. Ideally, the name of your function will be short, but clearly evoke what the function does. That's hard! But it's better to be clear than short, as Visual Studio Code's autocomplete makes it easy to type long names.\n",
    "\n",
    "Generally, function names should be verbs, and arguments should be nouns. There are some exceptions: nouns are ok if the function computes a very well known noun (i.e. `mean` is better than `compute_mean`), or accessing some property of an object (i.e. `coef` is better than `get_coefficients`). A good sign that a noun might be a better choice is if you're using a very broad verb like \"get\", \"compute\", \"calculate\", or \"determine\".\n",
    "Use your best judgement and don't be afraid to rename a function if you figure out a better name later.\n",
    "\n",
    "```python\n",
    "# Not a verb, or descriptive\n",
    "my_awesome_function()\n",
    "# Long, but clear\n",
    "impute_missing()\n",
    "collapse_years()\n",
    "```\n",
    "\n",
    "If your function name is composed of multiple words, use \"snake_case\", where each lowercase word is separated by an underscore. If you have a family of functions that do similar things, make sure they have consistent names and arguments. Use a common prefix to indicate that they are connected. That's better than a common suffix because autocomplete allows you to type the prefix and see all the members of the family.\n",
    "\n",
    "```python\n",
    "# Good\n",
    "input_select()\n",
    "input_checkbox()\n",
    "input_text()\n",
    "# Not so good\n",
    "select_input()\n",
    "checkbox_input()\n",
    "text_input()\n",
    "```\n",
    "\n",
    "A good example of this design is the **pandas** package: if you don't remember exactly which function you need to read in data, you can type `pd.read_` and jog your memory as the autocomplete brings up the options."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6735f101",
   "metadata": {},
   "source": [
    "## Function Scope\n",
    "\n",
    "Scope refers to what parts of your code can see what other parts. There are three different scopes to bear in mind: local, global, and non-local.\n",
    "\n",
    "**Local**\n",
    "\n",
    "If you define a variable inside a function, the rest of your code won't be able to 'see' it or use it. For example, here's a function that creates a variable and then an example of calling that variable:\n",
    "\n",
    "```python\n",
    "def var_func():\n",
    "    str_variable = 'Hello World!'\n",
    "\n",
    "var_func()\n",
    "print(str_variable)\n",
    "```\n",
    "\n",
    "This would raise an error, because as far as your general code is concerned `str_variable` doesn't exist outside of the function. This is an example of a *local* variable, one that only exists within a function.\n",
    "\n",
    "\n",
    "If you want to create variables inside a function and have them persist, you need to explicitly pass them out using, for example `return str_variable` like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8725b986",
   "metadata": {},
   "outputs": [],
   "source": [
    "def var_func():\n",
    "    str_variable = \"Hello World!\"\n",
    "    return str_variable\n",
    "\n",
    "\n",
    "returned_var = var_func()\n",
    "print(returned_var)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddd6f63e",
   "metadata": {},
   "source": [
    "**Global**\n",
    "\n",
    "A variable declared outside of a function is known as a global variable because it is accessible everywhere:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0a4fe15",
   "metadata": {},
   "outputs": [],
   "source": [
    "y = \"I'm a global variable\"\n",
    "\n",
    "\n",
    "def print_y():\n",
    "    print(\"y is inside a function:\", y)\n",
    "\n",
    "\n",
    "print_y()\n",
    "print(\"y is outside a function:\", y)"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "9d7534ecd9fbc7d385378f8400cf4d6cb9c6175408a574f1c99c5269f08771cc"
  },
  "jupytext": {
   "cell_metadata_filter": "-all",
   "encoding": "# -*- coding: utf-8 -*-",
   "formats": "md:myst",
   "main_language": "python"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "toc-showtags": true
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
